Supervised Machine Learning for Text Analysis in R by Hvitfeldt Emil;Silge Julia;

Supervised Machine Learning for Text Analysis in R by Hvitfeldt Emil;Silge Julia;

Author:Hvitfeldt, Emil;Silge, Julia;
Language: eng
Format: epub
Publisher: CRC Press LLC
Published: 2021-09-23T00:00:00+00:00


In the last section, we fit one time to the training data as a whole. Now, to estimate how well that model performs, let's fit the model many times, once to each of these resampled folds, and then evaluate on the heldout part of each resampled fold.

We can extract the relevant information using collect_metrics() and collect_predictions().

What results do we see, in terms of performance metrics?

The default performance parameters for binary classification are accuracy and ROC AUC (area under the receiver operator characteristic curve). For these resamples, the average accuracy is 80.2%.

Accuracy and ROC AUC are performance metrics used for classification models. For both, values closer to 1 are better.

Accuracy is the proportion of the data that is predicted correctly. Be aware that accuracy can be misleading in some situations, such as for imbalanced data sets.

ROC AUC measures how well a classifier performs at different thresholds. The ROC curve plots the true positive rate against the false positive rate; AUC closer to 1 indicates a better-performing model, while AUC closer to 0.5 indicates a model that does no better than random guessing.

Figure 7.1 shows the ROC curve, a visualization of how well a classification model can distinguish between classes, for our first classification model on each of the resampled data sets.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.